The only thing you really need to do to prepare for R training is to install the latest version of R and RStudio. We’ll talk about the difference between R and RStudio on the first day, but for now, just make sure they’re installed. Directions for installing R/RStudio are below. If you run into any problems, check the instructions at R for Data Science Section 1.4.1 and 1.4.2.
NOTE: If you don’t have Administrative privileges on your computer, you will have to submit an IT HelpDesk Ticket (need to be on VPN or NPS network to access this link) to install R and RStudio, which could take a while. PLEASE PLAN AHEAD!
Even if you already have R or RStudio installed, please install the latest versions of both programs. R recently went through a major version change from 3.x to 4.x with some potential code-breaking changes. The latest versions are needed to ensure everyone’s code behaves the same way.
If you are attending Day 4: R packages and version control, you will need to install Git for Windows, RTools, and devtools and roxygen2 packages.
Download the latest 64-bit Git for Windows by clicking on the “Click here to download” link at the top, and installing the file. Once installed, RStudio typically can find it.
Download files and instructions for Rtools installation are available on CRAN’s RTools page. It’s a large file and may require admin privileges to install, so be sure to install prior to training. You must also be using R 4.0 or higher for this training, and be sure to download Rtools4.
After you install Rtools, you’ll want to install the devtools package. The devtools package allows you to install packages from github, and will be the easiest way for others to install your packages in their R environment. Run the code chunk below to make sure everything is installed and running correctly. The devtools package has a lot of dependencies, and you often have to install new packages or update existing packages during the process. If you’re asked to update libraries while trying to install devtools, you should update them. The most common reason the devtools install doesn’t work is because you have an outdated version of one of its dependencies installed.
install.packages('devtools')
library(devtools)
The roxygen2 package is a dependency of devtools, and it should be installed if you successfully installed devtools. However, it’s always good to check that it installed properly. The roxygen2 package helps with package documentation. The usethis package is relatively new, and some features that used to live in devtools now live in usethis.
library(roxygen2)
library(usethis)
Once these packages load without errors, you’re all set. If you have any issues making this work, contact Kate Miller or Sarah Wright prior to training, and we’ll help you troubleshoot.
I will leave you to google this and dive into the rabbit hole that is the definition of functional programming. This isn’t about writing code that works, it is a technical thing. In short, a “functional” is a higher order function that takes a function as one of its inputs. For our purposes “functional programming” will focus on iterative functionals (apply family and map) and how to make functions that can be passed to functionals.
The goal of functional programming is more stable, transparent, and reliable code.
This module will provide a look at simple and moderately complex ‘functions’ in R. We will look at how to make a function and then we will look at how you apply that function in iteration. The goal with this is to equip you with the two most powerful tools any R user can have. 1) The ability to create a function. 2) The ability to use that function to get a lot of repetitive tasks done quickly. You already learned a little bit about this second one on the Day 4 iteration and you saw us using some functions there.
Although we are calling this the advanced training, the approaches here are intended to get you started, but are not exhaustive. As you practice these fundamentals you will quickly find better ways to get some of the things you want to do done.
Thomas - A function is a container for a series of steps that are performed on data. It has some fixed expectations about input and output - another way of saying that is that a function expects a specific pattern(s) for input and for output. A function is also an expression of what another person thinks the right way to do something is. The nice thing is that if you don’t like all that, you can write your own function, but be selective about that.
JP - Functions are a way of taking a large problem and breaking it down into simpler discrete steps. Each function can just focus on one step and makes it easier to do that step in isolation. You can then reuse the functions to help solve new problems in the future.
Everything is a function. But what is it really? A function usually has 3 components - the function name, arguments, and the body or source code.
Hang on, what was that ‘usually’ jazz? Okay, there are ‘named functions’ and ‘anonymous functions.’ The difference is that when you plan on reusing a function, you give it a name (3 components). If you don’t plan on using it ever again, you don’t give it a name (2 components) and it is called an ‘anonymous function.’ I am going to show you examples of both, but not get too hung up on the taxonomy.
Let’s look at mean(x) as an example:
The name of the function is simply “mean”. Function names should be simple and somewhat intuitive (If your function calculates the data mean, but you name it “Pat” that doesn’t make sense). You should also be careful not to give you function the same name as something that exists in base R or in a package that you might commonly use. This can cause conflicts in your environment and unexpected results. R is pretty good about warning you about this.
The arguments of the function are what you put inside the parentheses. Arguments tell the function what the data are and they tell it how to handle the data. In this case mean(x) is telling the mean function that the data to operate on are x.
Almost all functions have more than 1 argument, however most of the time you are only specifying 2 or 3 when you call the function. If you want to know what arguments a function can accept, help() will take you to a hopefully useful explanation. In there you can see what arguments have defaults, what those defaults are, and when you should think about changing the defaults.
The last bit of a function is the source code (“source” for short). This is what the function does. 95% of the time you can safely ignore the source. However, it is useful to look at the source when you want to understand why a function is doing what it is doing, modify a function, see what arguments it can accept, what the defaults are, etc.
How do you get to the source? These days, for non-base packages a lot of this can be found on github (eventually). The fastest way, is to type the function name without the () into the console it will return the underlying source code for that function if it is available.
## function (x, ...)
## UseMethod("mean")
## <bytecode: 0x0000000015a79ff0>
## <environment: namespace:base>
## function (formula, data, subset, weights, na.action, method = "qr",
## model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE,
## contrasts = NULL, offset, ...)
## {
## ret.x <- x
## ret.y <- y
## cl <- match.call()
## mf <- match.call(expand.dots = FALSE)
## m <- match(c("formula", "data", "subset", "weights", "na.action",
## "offset"), names(mf), 0L)
## mf <- mf[c(1L, m)]
## mf$drop.unused.levels <- TRUE
## mf[[1L]] <- quote(stats::model.frame)
## mf <- eval(mf, parent.frame())
## if (method == "model.frame")
## return(mf)
## else if (method != "qr")
## warning(gettextf("method = '%s' is not supported. Using 'qr'",
## method), domain = NA)
## mt <- attr(mf, "terms")
## y <- model.response(mf, "numeric")
## w <- as.vector(model.weights(mf))
## if (!is.null(w) && !is.numeric(w))
## stop("'weights' must be a numeric vector")
## offset <- model.offset(mf)
## mlm <- is.matrix(y)
## ny <- if (mlm)
## nrow(y)
## else length(y)
## if (!is.null(offset)) {
## if (!mlm)
## offset <- as.vector(offset)
## if (NROW(offset) != ny)
## stop(gettextf("number of offsets is %d, should equal %d (number of observations)",
## NROW(offset), ny), domain = NA)
## }
## if (is.empty.model(mt)) {
## x <- NULL
## z <- list(coefficients = if (mlm) matrix(NA_real_, 0,
## ncol(y)) else numeric(), residuals = y, fitted.values = 0 *
## y, weights = w, rank = 0L, df.residual = if (!is.null(w)) sum(w !=
## 0) else ny)
## if (!is.null(offset)) {
## z$fitted.values <- offset
## z$residuals <- y - offset
## }
## }
## else {
## x <- model.matrix(mt, mf, contrasts)
## z <- if (is.null(w))
## lm.fit(x, y, offset = offset, singular.ok = singular.ok,
## ...)
## else lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok,
## ...)
## }
## class(z) <- c(if (mlm) "mlm", "lm")
## z$na.action <- attr(mf, "na.action")
## z$offset <- offset
## z$contrasts <- attr(x, "contrasts")
## z$xlevels <- .getXlevels(mt, mf)
## z$call <- cl
## z$terms <- mt
## if (model)
## z$model <- mf
## if (ret.x)
## z$x <- x
## if (ret.y)
## z$y <- y
## if (!qr)
## z$qr <- NULL
## z
## }
## <bytecode: 0x0000000015aa9e78>
## <environment: namespace:stats>
Some functions have more information on what they are doing some don’t. mean doesn’t have much to show us. That is because it is a compiled function and part of the R source code. If you need to or already have the ability to dig down into compiled functions you probably don’t need to be in this course. But, see reference [1] if you want to try!
When you type in lm you get a lot more information out and you can see there there is a fair amount of code that is being used to execute lm.
Examples: + You want to create separate linear models for the relationship between Chloride and specific conductance at 25 sites. You could analyze each site as a separate model or you could write a function that could work through each site. + You need to rename 250,000 acoustic recorder files every year. You could resign from your job or you could write a function. + What are your examples?
There are some rules to keep in mind for your functions before we get into developing them.
+ Avoid choosing names that are not intuitive or are already taken.
+ Argument names can (arguably ‘should’) be the same as other parameters or variable names in other functions.
+ One function for each task vs one function to rule them all? Air on the side of 1 function for each task. But sometimes you do really want a super function.
+ Make sure your function is interpretable. #Comments accomplish this.
+ Figure out where to draw the line between the function and the functional (iteration).
Below are some of the steps we find ourselves following when we need to develop functions. + Verbalize what you want the function to do and do some googling + Identify the pattern or patterns in the data your function will need to operate on. + Decide what you want the function to output. + Within the expected pattern, set up test cases (test data). I like a clear positive case and a negative case. + Do some programming. + Test the function until you are satisfied.(Debugging, See day 4) + Apply it to the data.
Let’s create our first function by changing the default behavior of an existing one
set.seed(12345) #gives everybody the same data
d<-c(floor(runif(100)*100),NA) #generate random data
mean(x=d) #unexpected result
The response is NA, which is not what we want (maybe). It is easy enough to address this with mean(d,na.rm=T), but we may not want to do that many times throughout our code.
mean2<- #Tell [R] that I want this new function to be named "mean2"
function(x){ #the function consists of 1 parameter named x (aka the data) The { begins the function source code / expressions.
mean(x,na.rm=T) #in the mean function change the default for na.rm=T
} #close function
Now let’s check its behavior.
mean2(x=d) #more expected result
That handled the NA value without giving an error. What if we want to switch that back?
mean2(x=d, na.rm=F)
How you create your function affects what it can use as ‘arguments’.
When we set up our new function we did not tell it na.rm is an argument. We fixed it in the source code. If you want something as a parameter, it must be listed in the parentheses.
mean3<- function(x,na.rm=T){mean(x=x, na.rm=na.rm)}
mean3(d)
So now we have just made mean with the na.rm set to true and we can change that if needed.
mean4<- function(x,na.rm){#very minor change. I deleted the initial parameter value
mean(x=x, na.rm=na.rm)}
mean4(d)
What do you think is going to happen without the inital parameter value?
It didn’t work… or did it? How you set up your functions is partially about what you really intend for it to do. I would argue that none of these are wrong. They each have a different use and set of assumptions in their use.
mean - assumes that you want an NA if data are missing but might want to change that behavior.
mean2 - assumes you always want to ignore NAs and have no reason to change that behavior
mean3 - assumes you mostly want to ignore NAs but might want to change that behavior
mean4 - assumes nothing and forces you to explicitly state how you are going to handle NA values.
A few final things on the basics of how functions function. You will see the coding for a simple 1 line function expressed 2 ways:
mean5<- function(x,na.rm){mean(x=x, na.rm=na.rm)} #always works
mean5<- function(x,na.rm) mean(x=x, na.rm=na.rm) #only works on one line
If a function can be expressed with a single line you do not need curly brackets. If the function is on more than one line you must use curly brackets. There are some other details on this, but that is all you really need to know. My preference is that you should always use the curly brackets even if you don’t need to.
I want to come back to what “functionals” are now. For our purposes, a functional is a higher-order function that takes a function and data as their inputs (maybe a few other things as well). The functional iterates that function over the data in a predefined way and returns a certain ouput (.e.g vector, list, dataframe, etc). This is one of the reasons why functionals are preferred over for loops. Functionals have an expected behavior. With for loops you are defining the behavior. With functionals, that is largely done and you are just picking the one or combination that does what you want.
Which of the following is a correct statement? +The for loop isn’t iterating correctly. +The functional isn’t iterating correctly.
Can you think of any reasons why we would want to sacrifice the flexibility of a for loop for the rigidity of a functional?
We are going to use two functional families apply (base R) and map (purrr) - there are more out there, but these two are the easiest to understand. We will build some general approaches in apply and then rebuild them in map.
On day 4 as we learned about iteration and were using functions. I told you to ignore the whole why they were functions thing for now. Let’s go back and pick those apart a little more and make some useful functions for that hobo temperature data. We will make a summary function, a plot function, and a modeling function.
Let’s read in that Hobo data again and take a closer look at the functions and functionals.
library(ggplot2);library(magrittr)
#get that data
fNames<-c("APIS01_20548905_2021_temp.csv",
"APIS02_20549198_2021_temp.csv",
"APIS03_20557246_2021_temp.csv",
"APIS04_20597702_2021_temp.csv",
"APIS05_20597703_2021_temp.csv")
fPaths<-paste0("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_Intro/master/Data/", fNames)
HoboList<-lapply(fPaths, FUN=read.csv, skip=1, header=T)%>% #1. read hobo data into a list
lapply(., "[",,1:4)%>% #2. Grab only first 3 columns. Empty comma is not an error
lapply(., setNames, c("idx","DateTime","T_F","Lum"))%>% #3. set col names
lapply(., dplyr::mutate, DateTime2=as.POSIXct(DateTime, "%m/%d/%y %H:%M:%S", tz="UCT"))%>%#4. format datetime in new variable.
setNames(., fNames) #5. name each one for tracking
This ws actually our first functional programming exercise and we did 4 of them. See numbering in comments above:
1. Take each element of the vector fPaths and pass it to the function (FUN) read.csv one by one and return a list of data frames. Not the arguments that come after the function identification are arguments for the specified function.
2. You may be surprised to know that the indexing notation [] is actually a function and can be called using "[" and passing it any row and column information.
3. Rename the columns.
4. Create a new variable that formats the data as datetime so it plots correctly.
So, that was the aplication of canned functions in a functional, on to the new functions.
The way we did summary statitics before was a little clunky. It was only really appropriate if that was the only place we would be doing it that way. Let’s assume we will do it that way for a number of different datasets in different workflows.
Let’s strategize a little. I have two phases to consider. The function and the iteration.
Function considerations
+Hobo data has 3 columns. Is this just for hobo data? Doesn’t have to be. Maybe the column number will change if I decide to analyze light data at some point. I probably want the target column to be an argument so I can change it.
+I want to get the ouput for a bunch of simple statistics.
+All will operate on the same data.
+All have singular outputs.
+na.rm is defaulted to FALSE in all of then and I would like it ot be true.
+I am inputting a list of data, but I want a dataframe or something easily coerced into one back.
+I want to be sure I am properly tracking names of columns and files in this.
Iteration considerations +Inputting a list of dataframes but I want a single dataframe in return that summarizes all of them. +I want to be sure I am properly tracking names of columns and files in this.
Now that I have put some thought into it, let start coding. Note on code: Primarily intended for demonstration purposes. Can be done more cleanly in purrr::map.
hobo_summary <- function(x, col=3) {
funs <- c(mean, median, sd, mad, IQR) #list of functions
unlist( #unlist simplifies somewhat ugly output
y<-lapply(funs, function(f){f(x[,col], na.rm = TRUE)})%>%
setNames(.,c("mean", "median", "sd", "mad", "IQR"))
)
} #credit to [5] for this general example idea
summarized_data<-lapply(HoboList, FUN=hobo_summary)%>%
do.call("rbind", . )%>%
as.data.frame( . )
Questions:
+How many arguments does hobo_summary take? How many have a default value?
+What is funs?
+What is lapply(funs, function(f){f(x[,col], na.rm = TRUE)}) passing to the anonymous function?
+What is that anonymous function doing?
+What is lapply(HoboList, FUN=hobo_summary) passing to hobo_summary?
+What are the benefits and drawbacks of setting the column names in hobo_summary?
+What is unlist doing?
I find that rapidly generating a pile of plots is often useful for some initial QA/QC. Do my data look about like I expect? This allows me to quickly generate a few or hundreds of plots and flag some data for closer inspection.
Strategy time
Function considerations +I need to make a simple plot. Plot would work fine, but find ggplot makes saving the plot to a variety of formats easier because of ggsave.
+Hobo data has 3 columns.
+Maybe some files have some or all data missing. ggplot handles missing data reasonably well
+Time series data. ggplot can plot that just fine.
+I want a plot title, no plot title in file name. I only know plot title relatively.
Iteration considerations +In each file the column locations and data contained are the same.
+Each of the 5 have similar names - could be useful for iteration.
+I strongly suspect missing data.
+I want to name the ouput plots using their source filenames. I can name the elements in the data list so I know which is which, but lapply doesn’t pass that information along. I can reference a separate vector, but that reduces the felxibility of the code. If I am going to use lapply, this means I need to get a little creative.
ggplotCustom<-function(i, j, pattern=".csv", replacement="_plot.pdf", path=choose.dir(), device="pdf", height=5, width=5,units="in"){
p<-ggplot(data = j[[i]], aes(x=DateTime2, y=T_F))+
geom_point()+
ggtitle(names(j)[i])
ggsave(filename=gsub(pattern=pattern, replacement = replacement, names(j)[i]),
path=path, plot=p, device=device,
height=height, width=width, units=units)
}
#do some quick testing
ggplotCustom(i=1, j=HoboList, path= "C:/Users/tparr/Downloads/Training_Output/") #test to see if function is working for a positive case
ggplotCustom(i=3, j=HoboList, path= "C:/Users/tparr/Downloads/Training_Output/") #test to see how it behaves on a negative case
#now iterate
lapply(seq_along(HoboList), FUN=ggplotCustom, HoboList, path="C:/Users/tparr/Downloads/Training_Output/")
Someone’s head just exploded with all that so let’s pause and upack this because I threw a lot in here. Let’s start by talking about some of the logic in the arguments list. function(i, j, pattern=".csv", replacement="_plot.pdf", path=choose.dir(), device="pdf", height=5, width=5,units="in"):
What is i?
In this case, i is going to be an index that is the number of list elements.
Where are we using pattern and replacement?
Then we defined a bunch of stuff for ggsave. Saving is pretty fiddly and depending on how I am going to use the function I may want to change those save arguments without rewriting the entire function. If I am happy with the defaults, I don’t need to state them each time.
choose.dir - if you don’t know where it is going to go, you get prompted to select a directory for output. My intent is that I will always specify the path, but maybe I will forget or something. But then again I am going to end up specifying on each step of iteration.
Now let’s look at how this was fed into lapply: lapply(seq_along(HoboList), FUN=ggplotCustom, HoboList, path="C:/Users/tparr/Downloads/Training_Output/"):
seq_along basically creates a vector from 1 to the number of list elements.
where is i? The way lapply works, is that it takes the first element of the list and passes it to the first variable of the specified function. In the above call the first element produced by seq_along is 1 and the first variable of ggplotCustom is i. So it basically says i=1. If we had not used seq_along, this would then take this first element of HoboList which is a dataframe. This is not an undesirable behavior but in this case, we want to be able to reference back to where that dataframe is so we can extract it’s name.
As you advance in R or have more complex datasets you will want to iterate your analyses. We will work through an example for iterating a linear model analysis For the hobo data. The question will be simple: How does light level affect air temperature?
lm_ls<-function(data,x){ifmod<-lm(x, data=data); return(mod)}
modlist<-lapply(HoboList[c(2,4,5)], lm_ls, T_F~Lum) #I know some are missing light data
We can now extract some stuff and do some diagnostics. A lot of common tools can be used in a functional.
lapply(modlist, summary)
lapply(modlist, plot)
lapply(modlist, coef)
That was nice but it wasn’t in the best format for further use and it wasn’t what we really needed. Let’s drill down to an output we like.
lm_stats<-function(mod){ mod_sum<-summary(mod) #may or may not be worthwhile
out<-data.frame(
intercept= coef(mod)[[1]],
slope= coef(mod)[[2]],
slp_pval=mod_sum$coefficients[,4][[2]], #see what happens if you run this without the [[2]]
R2_adj= mod_sum$adj.r.squared,
mod_pval= mod_sum$fstatistic %>% {unname(pf(.[1],.[2],.[3],lower.tail=F))})
return(out)
}
m<-lapply(modlist, lm_stats)%>%
do.call(rbind,.)%>%
dplyr::mutate(.,id=rownames(.))%>%
magrittr::set_rownames(.,1:nrow(.))
You know the drill, what are your questions? +What makes sense? +What doesn’t make sense? +Why am I using the “::” (if we haven’t addressed that already).
We are dealing with small problems. Small problems seldom push into the limits of your computer. Large datasets and complex functions can take a long time to process (even after you fully optimize them). In R, this is primarily a function of your processor speed. R is only running on a single processing core. In other words, for something like lapply (or map, or foreach) it processes each iteration sequentially on a single core. It doesn’t need to be that way. Most computers have more than 2 cores. you could be executing different independent iteration steps on separate cores and recombining the results. This is called ‘parallel processing’.
There are versions of this out there for lapply but they never seem to work quite right. But, the good news is that there does appear to be a ~new unified framework that can be used for any coding style. So, base, tidy, and foreach approaches can all be easily parallelized using the functions in the future.apply package.
So let’s explore and time a parallelization of lapply. Your code may vary if not on Windows. This will take 1-2 minutes to run depending on your computer.
HoboList2<-c(rep(HoboList,5)) #make the dataset larger
plan("multisession", workers=parallel::detectCores()-1) #initiate a multicore session, the number of cores to use to 1 fewer than the max detected. Reduces chance of overwhelming the system.
microbenchmark::microbenchmark(
"sequential"={lapply(seq_along(HoboList2), FUN=ggplotCustom, HoboList2, path="C:/Users/tparr/Downloads/Training_Output/")},
"parallel"={future_lapply(seq_along(HoboList2), FUN=ggplotCustom, HoboList2, path="C:/Users/tparr/Downloads/Training_Output/")},
times=5,
unit="s"
)
plan("sequential") #close the multicore session.
My run says that parallelization was 23% faster than sequential. Not a huge speed improvement, but something to keep in mind to try if a chunk of code is taking ~30 minutes to execute. Now I kind of want to go and see if this approach can speed up file.copy.
One important thing to remember is that initiating a parallel session can slow down your computer significantly if not done properly. Best to test it with small data then scale up.
[1] https://stackoverflow.com/questions/19226816/how-can-i-view-the-source-code-for-a-function
[2] http://rstudio-pubs-static.s3.amazonaws.com/5526_83e42f97a07141e88b75f642dbae8b1b.html
[3] https://stackoverflow.com/questions/45101045/why-use-purrrmap-instead-of-lapply]
A two part system. -You have something you want to do many times. This is your function. -You need some way to do it many times, that is your iterative function.
Examples for() and stuff
#How different iteration tools handle / parse your data and pass it to functions - what is in x and
#more in-depth look at iteration options? #vectorization #if/else, while #for #apply family #do.call #purr family
#use the function we modify to do something #example using map family should operate on one dataframe to get started
#When to start looking for a different way to iterate? When you are violating some aspect of Dry. lots of times and each time is similar. #avoiding loops within loops within… when you can #clear return statement #commenting #profiling for optimizing a more complex function
#system.time() #microbenchmark #profiler thing that gives you each step and how long it took?
Link back to D4 debugging tab.
R Markdown is a special file format that you can open in RStudio (go to File > New File, and “R Markdown…” is the 3rd option). R Markdown takes what you put into the .Rmd file, knits the pieces together, and renders it into the format you specified using PanDoc, which is typically installed as part of the RStudio IDE bundle. The knit and render steps generally occur at the same time, and the terms are often used interchangeably. For example, the Knit button () knits and renders the .Rmd to your output file. The knit shortcut is also super handy, which is Ctrl + Shift + K.
To get started, you’ll need to make sure you have the rmarkdown package installed. The knitr package, which does a lot of heavy lifting, is a dependency of rmarkdown, so both will be installed with the line of code below.
install.packages("rmarkdown")
Once you have rmarkdown installed, you should be able to go to File > New File > R Markdown…, and start a new .Rmd file. After selecting “R Markdown…”, you will be taken to another window where you can add a title and author and choose the output format. For now, let’s just use the default settings and output to HTML. You should now see an Untitled .Rmd file in your R session with some information already in the YAML and example plain text and code chunks. You can also start with a blank .Rmd, which will be more convenient once you get the hang of it.
The .Rmd file itself consists of 3 main pieces. There’s the YAML (Yet Another Markup Language) code at the top, which is contained within ---, like the image below. The top YAML is typically where you define features that apply to the whole document, like the output format, authors, parameters (more on that later), whether to add a table of contents, etc. The YAML below is what we’re using for this website. Note that indenting is very important in YAML. The css: custom_styles.css tells Markdown that I want the styles defined in my css, rather than the default styling in Markdown. This is optional, and is here just to show a variation on the default YAML you get when starting a new .Rmd. If you don’t want to use your own custom style sheet, then your YAML would just have the following in one line: output: html_document.
This is what it sounds like- it’s just text. You can write anything you want outside of a code chunk, and it will render as if you’re writing in a word processor, rather than as code. Although, note that special characters like % and & may need to be escaped with a / before the symbol, particularly if you’re using LaTeX (more on that later).
You can format text using Markdown’s built in functions, like those shown below. For a more detailed list of these formatting functions, check out the R Markdown Cheatsheet. You can also code HTML directly in R Markdown, which I actually find easier the more I get comfortable with HTML. The section below shows how to use the same common styles with R Markdown and HTML and what the output looks like.
The actual text in the .Rmd:
# First-level header
## Second-level header
...
###### Sixth-level header
*italic* or _italic_
**bold** or __bold__
superscript^2^
endash: --
Example sentence: *Picea rubens* is the dominant species in **Acadia National Park**.
The HTML version:
<h1>First-level header</h1>
<h2>Second-level header</h2>
...
<h6>Sixth-level header</h6>
<i>italic</i>
<b>bold</b>
superscript<sup>2</sup>
endash: –
Example sentence: <i>Picea rubens</i> is the dominant species in <b>Acadia National Park</b>.
The text renders as:
italic or italic
bold or bold
superscript2
endash: –
Example sentence: Picea rubens is the dominant species in Acadia National Park.
Code chunks are also what they sound like. They’re chunks of R code (can be of other coding languages too), which run like they’re in an R script. They’re contained within back ticks and curly brackets, like below.
```{r}
```
`````
You can customize the behavior and output of a code chunk using options within the{ }. Common chunk options are below:
echo = TRUE prints the code chunk to the output. FALSE omits the code from output.
results = 'hide' omits results of code chunk from output. show is the default.
include = FALSE executes the code, but omits the code and results from the output.
eval = FALSE does not execute the code chunk, but can print the code, if echo = TRUE.
cache = TRUE allows you to cache the output associated with that code chunk, and will only rerun that chunk if the code inside the chunk changes. Note that if the objects or data in the code chunk are changed, but the code within the chunk is still the same, the code chunk won’t realize that it needs to rerun. You need to be careful about using the cache option.
fig.cap = "Caption text" allows you to add a figure caption.
fig.width = 4; fig.height = 3 allows you set the figure size in inches.
out.width = 4; out.height = 3 allows you set the figure or table size as a percentage of the container/page size.
message = FALSE, warning = FALSE prevent messages or warnings from chatty packages from being included in the output.
See the R Markdown Cheatsheet for a complete list of code chunk options.
Another trick to using code chunk options, is that they can be conditional based on the results from another code chunk. For example, I have a QC report that runs 40+ checks on a week’s worth of forest data, but the report only includes checks that returned at least 1 value/error. Checks that returned nothing are omitted from the report using conditional eval. I’ll show an example of that later.
For all of the output types, the built-in markdown functions, like ‘#’ for level 1 header, render as you expect. Most output types also have an additional code base that allows more advanced and customized features.
The code base for HTML is obviously HTML and cascading style sheets (CSS). HTML is used to structure the content. CSS is used to define the style (eg font type and size for each header level).
Rendering to PDF requires a LaTeX engine that runs under the hood. The easiest engine to install is tinytex, and there are instructions and download files on the “Prep for Training” tab of this site.
Rendering to Word can be a helpful way to generate a Results section for a report, or even the entire report for the first draft. It saves having to copy/paste figures and tables from R to Word, and makes life easier if you need to rerun your analysis or update a figure. You just update the .Rmd code and render again to Word. Admittedly I rarely use this option because the functionality is much more limited than outputting to PDF or HTML(See first Con below). For more details than we’ll go into today, check out RStudio’s article: Happy collaboration with Rmd to docx.
officedown package may add functionality, but I haven’t played around with it enough (and the times I checked for a certain function, it didn’t have it).There are quite a few packages that can help you make publication quality and customized tables. The two tables I see used most frequently are kable() in the knitr package and datatables() in the DT package (not to be confused with data.table() package for data wrangling in R). The learning curve for kable is pretty shallow, and runs HTML under the hood. The learning curve for DT is a bit steeper, and has javascript under the hood. That means you can customize and add more features using those languages, if you know them. I tend to stick with kable, because I find HTML/CSS easier to code. If I need more bells and whistles, then I use datatables.
First, we’ll load a fake wetland dataset on our GitHub repo to make some summary tables using each package. The code below downloads the dataset from the training GitHub repo, and then summarizes the number of invasive and protected species per site. For both examples, the output format is HTML. If I were outputting to PDF, then I’d need to specify the format as ‘latex’ and use LaTeX code for any custom features not built into kable.
library(tidyverse)
wetdat <- read.csv(
"https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_Advanced/main/data/ACAD_wetland_data_clean.csv")
wetsum <- wetdat %>% group_by(Site_Name, Year) %>%
summarize(num_inv = sum(Invasive), num_prot = sum(Protected),
.groups = 'drop')
The code below creates a simple table that renders in HTML, is only as wide as the records in the table, and has alternating row colors. If you’re outputting to PDF, your format will be “latex” instead of “HTML” and you’ll need to use LaTeX for any custom formatting/styling that aren’t built into kable and kableExtra.
Note also that the version of kableExtra on CRAN currently has a bug that causes collapse_rows() not to function. I’ll show what this does in a minute, but for now, just know that if you want to collapse rows in your kable, you’ll need to install the development version of kableExtra on GitHub. Code for that is below. You’ll need the devtools package installed to install it. If you’ve already loaded kableExtra in your session, you’ll also need to restart your session (Note: Ctrl + Shift + F10 is the fastest way to restart your R session).
devtools::install_github("haozhu233/kableExtra")
library(kableExtra) # for extra kable features
library(knitr) # for kable
wet_kable <- kable(wetsum, format = 'html') %>% # if using pdf, need LaTeX
kable_styling(full_width = FALSE, bootstrap_options = 'striped') #kableExtra function
wet_kable
| Site_Name | Year | num_inv | num_prot |
|---|---|---|---|
| RAM-05 | 2012 | 0 | 3 |
| RAM-05 | 2017 | 1 | 3 |
| RAM-41 | 2012 | 0 | 0 |
| RAM-41 | 2017 | 1 | 0 |
| RAM-44 | 2012 | 1 | 0 |
| RAM-44 | 2017 | 1 | 0 |
| RAM-53 | 2012 | 2 | 0 |
| RAM-53 | 2017 | 3 | 1 |
| RAM-62 | 2012 | 0 | 0 |
| RAM-62 | 2017 | 0 | 0 |
| SEN-01 | 2011 | 0 | 1 |
| SEN-02 | 2011 | 0 | 1 |
| SEN-03 | 2011 | 0 | 0 |
Note the use of pipes in the code above. The great thing about kable and kableExtra is that you can pipe functions together to build out a large table with all kinds of formatting, including conditional formatting. You can also make a custom kable function that has all of the formatting options you want, and just specify the dataset to build the table for. You can then pipe more features onto that function. We’ll show a couple of these examples below.
# custom kable function that requires data, column names and caption
make_kable <- function(data, colnames = NA, caption = NA){
kab <- kable(data, format = 'html', col.names = colnames, align = 'c', caption = caption) %>%
kable_styling(fixed_thead = TRUE,
bootstrap_options = c('condensed', 'bordered', 'striped'),
full_width = FALSE,
position = 'left',
font_size = 12) %>%
row_spec(0, extra_css = "border-top: 1px solid #000000; border-bottom: 1px solid #000000;") %>%
row_spec(nrow(data), extra_css = 'border-bottom: 1px solid #000000;')
}
# use function with wetsum data
wetkab2 <- make_kable(wetsum,
colnames = c("Site", "Year", "# Invasive", "# Protected"),
caption = "Table 1. Summary of wetland data") %>%
scroll_box(height = "250px")
| Site | Year | # Invasive | # Protected |
|---|---|---|---|
| RAM-05 | 2012 | 0 | 3 |
| RAM-05 | 2017 | 1 | 3 |
| RAM-41 | 2012 | 0 | 0 |
| RAM-41 | 2017 | 1 | 0 |
| RAM-44 | 2012 | 1 | 0 |
| RAM-44 | 2017 | 1 | 0 |
| RAM-53 | 2012 | 2 | 0 |
| RAM-53 | 2017 | 3 | 1 |
| RAM-62 | 2012 | 0 | 0 |
| RAM-62 | 2017 | 0 | 0 |
| SEN-01 | 2011 | 0 | 1 |
| SEN-02 | 2011 | 0 | 1 |
| SEN-03 | 2011 | 0 | 0 |
NA, you don’t have to specify them for the function. If you don’t, the column names in the table will be the names in the dataframe, and the caption will be omitted.
align = 'c'. If you wanted the first column to be left, and the next 3 to be centered, you would write align = c('l', rep('c', 3)).
width = "###px" to the argument. Note also that if you add a scroll box, you’ll want that line of code to be last. Otherwise you’re likely to run into weird issues with kable that prevent the table from rendering. This is why I piped it at the end, instead of adding to the function.
row_spec(0, ) adds a black border to the top and bottom of the header, which kable considers row 0.
row_spec(nrow(data)) is adding a black border to the bottom of the table regardless of the number of rows in the table.
Error in UseMethod("nodeset_apply").
ifelse() that ends in FALSE is just allowing the default color to be printed instead of the conditional color. That allows the alternating row colors to remain.
collapse_rows() pipe after any column_spec() or row_spec() calls. You can also collapse on multiple columns, but it is finicky about the order of the pipes. Just use trial and error until you find the order that works. Note also that collapse_rows() only works in the development version of the package (see above for installation instructions).
wetkab3 <- make_kable(wetsum,
colnames = c("Site", "Year", "# Invasive", "# Protected"),
caption = "Table 1. Summary of wetland data") %>%
row_spec(0, extra_css = "border-top: 1px solid #000000; border-bottom: 1px solid #000000;") %>%
column_spec(3, background = ifelse(wetsum$num_inv > 0, "orange", FALSE)) %>%
collapse_rows(1, valign = 'top')
| Site | Year | # Invasive | # Protected |
|---|---|---|---|
| RAM-05 | 2012 | 0 | 3 |
| 2017 | 1 | 3 | |
| RAM-41 | 2012 | 0 | 0 |
| 2017 | 1 | 0 | |
| RAM-44 | 2012 | 1 | 0 |
| 2017 | 1 | 0 | |
| RAM-53 | 2012 | 2 | 0 |
| 2017 | 3 | 1 | |
| RAM-62 | 2012 | 0 | 0 |
| 2017 | 0 | 0 | |
| SEN-01 | 2011 | 0 | 1 |
| SEN-02 | 2011 | 0 | 1 |
| SEN-03 | 2011 | 0 | 0 |
Using the same wetsum dataset we created earlier, we’ll make a table using datatable() and will add some of the features that kable() doesn’t have and that usually lead me to choose datatable over kable. We’ll start with a basic example and build on it.
library(DT)
wetdt <- datatable(wetsum, colnames = c("Site", "Year", "# Invasive", "# Protected"))
wetdt
The resulting table has several nice features that kable doesn’t offer.
If you want to show more or less entries in your table at a time, you can specify different values by adding options and then specifying values either for pageLength, or for lengthMenu. The pageLength option takes 1 value and will then display that number of entries in the table. The lengthMenu is similar, but also allows you to add multiple values to this list, which are then added to the dropdown menu in the Show [##] entries box. That allows the user to select the number of entries they want to see at a time.
I also added an option that stops the table from spanning the entire page.
# modify pageLength and lengthMenu
wetdt2 <- datatable(wetsum, colnames = c("Site", "Year", "# Invasive", "# Protected"),
width = "40%",
options = list(pageLength = 10,
lengthMenu = c(5, 10, 20))
)
bootstrap_options added striped cells in kable.
wetdt3 <- datatable(data.frame(wetsum, "Notes" = NA),
width = "40%",
colnames = c("Site", "Year", "# Invasive", "# Protected", "Notes"),
options = list(pageLength = 10),
class = 'cell-border stripe',
filter = list(position = c('top'), clear = FALSE),
editable = list(target = 'cell', disable = list(columns = 1:4))) %>%
formatStyle(3, backgroundColor = styleInterval(0, c('white', "orange")))
wetdt3
There are multiple ways to display a image that’s stored on disk. The easiest way to do it is with markdown code in the plain text part of your document, which looks like:
{width=400px}
Note that inserting a hyperlinked url is very similar. Just omit the ! and put the url in parenthesis instead of the path to the image. Like: [Link to IMD home page](https://www.nps.gov/im)
You can also use the HTML tag. The code below will produce the exact same image as the markdown code above. I seem to be able to remember better the
tag better than the markdown version, so I tend to use this more often.
<img src="./images/map_of_parks.jpg" alt = "Map of Region-1 IMD parks" width="400px">
.
I also like knitr’s include_graphics() function, as it can make iteration easier. For example, you can include a bunch of figures in a report based on a list of file names. Below I’m including all the photos in a photopoint folder, and making them only 25% of the width of the page. That puts them in a grid, which can be handy. You can also add breaks between each item in the list, and then they’ll plot separately. If you have an analysis with lots of plots, and they take awhile to render, I tend to write the plots to a disk and them bring them into the markdown document with include_graphics(). Using include_graphics() also means you’re running it in code chunks instead of the plain text, and allows you to dynamically number and reference figure names and specify figure captions at the same time. I’ll show that trick in a minute.
```{r photopoints, echo = T, out.width = "25%"}
photos <- list.files("./images/photopoints", full.names = TRUE)
include_graphics(photos)
```
`````
Code chunk options include several handy options to customize figures. These include:
fig.align: defines how the figure will be justified on the page. Options are ‘center’, ‘left’, ‘right’.
fig.cap: adds a caption to the figure. Must be quoted.
fig.height & fig.width: sets the height and width of the figure in inches. Must be numeric and is not quoted.
out.height & out.width: sets the height and width of the plot in the opened file. In this case, you set the dimensions as percents, like out.width = "50%" to make the figure half the width of the page of the rendered document.
You can also set global options so that all figures default to a certain size or alignment. That way, you’d only need to specify figure options in the code chunk if you want to stray from your default settings. Global options can be set like:
knitr::opts_chunk$set(fig.height=4, fig.width=6, fig.align='left')
Dynamic figure and table numbering and cross-referencing is one great feature of Markdown. The easiest way to add dynamic figure numbering and cross-referencing is to output to bookdown’s html_document2, instead of rmarkdown’s html_document, which is what I’ve shown so far. You’ll need to add a few lines to the YAML code as well, which we show below. The numbered_sections: false and number_sections: false prevent bookdown from adding numbering to each section. If you like them, you can delete those lines of code. You’ll also need to install bookdown (i.e., install.packages('bookdown')).
output:
bookdown::html_document2:
numbered_sections: false
number_sections: false
fig_caption: true
To see how all of this works, we need to create a couple of plots. Sourcing the code below will generate a fake dataset and then creates 2 plots. If this doesn’t work for some reason, you can copy and paste the code directly from the script named Generate_fake_invasive_data_and_plots.R in our IMD_R_Training_Advanced repository. I’m just trying to save room on the page for code that’s not important.
library(dplyr)
library(ggplot2)
devtools::source_url("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_Advanced/main/Generate_fake_invasive_data_and_plots.R")
The code below prints the plots with their respective figure numbers. Note that code chunk names should be alpha-numeric, and can’t include spaces or underscores.
```{r fig-inv-all, fig.cap = "Trends in invasive plant cover in NETN parks.", out.width = "50%"}
invplot_all
```
`````
Figure 1: Trends in invasive plant cover in NETN parks.
```{r fig-inv-acad, fig.cap = "Trends in invasive plant cover in ACAD.", out.width = "50%"}
invplot_ACAD
```
`````
Figure 2: Trends in invasive plant cover in ACAD.
Notice that the figures are numbered consecutively in the order that they appear. For cross-referencing, each code chunk needs a unique name and a figure caption must be defined in the chunk options. To cross-reference, you then write \@ref(fig:code-chunk-name). The same works for tables too, but you use “tab” instead of “fig”. The following text:
As you can see in Figure \@ref(fig:fig-inv-acad), invasive cover appears to be declining. Whereas, invasive cover appears more stable in other NETN parks (Figure \@ref(fig:fig-inv-all)).
renders as:
As you can see in Figure 2, invasive cover appears to be declining. Whereas, invasive cover appears more stable in other NETN parks (Figure 1).
R Markdown: The Definitive Guide. This is Yihui Xie’s book that’s available free online and for purchase. Yihui is one of the main developers at RStudio working on R Markdown and related packages (e.g. knitr, pagedown, bookdown). The online book is searchable, and is often the first place I check, when I can’t remember how to do something.
R Markdown Cookbook. Another great book with Yihui Xie as an author. This book is free online and has a lot of small bite-size tips for customizing R Markdown.
R for Data Science, chapters 27 and 29. The book itself is worth a read cover to cover. The chapters on R Markdown are also very helpful on their own.
RStudio’s R Markdown page: Includes several short videos and lots of tutorials, articles, and a gallery to give an idea of the many things you can do.
kableExtra vignette: Lots of great examples of the different styling/formats you can use with kables and the kableExtra package.
W3 Schools HTML page: This website includes helpful tutorials on HTML and CSS, includes executable examples for just about every HTML tag you can think of, and shows how different browsers render content.
HTML & CSS design and build websites: This book costs about $15 (cheaper if you can find a used copy) and was a very helpful introduction and continual resource, particularly for working with CSS. There’s a Javascript & JQuery book by the same author that’s equally well-done, but was a much steeper learning curve.
CTAN.org: This is LaTeX’s version of CRAN.
Overleaf Online LaTex Editor: Includes a short guide to learn LaTeX, examples for most common uses of LaTeX, and has a built in editor that you can excecute code in.
As you become more familiar with coding in R, your code will become longer and more complex. You will have projects that you revisit and update each year. You may wish to share your code with others and allow them to make suggestions and contributions.
If your code was a Word document, this is the point where you would turn on Track Changes. Think of version control as Track Changes for your code. Version control is much more sophisticated and flexible, which means that it comes with a steeper learning curve. If you come away from this class feeling like you have no idea what you are doing, don’t despair! Embrace the learning curve and remember that you don’t have to be an expert in version control to take advantage of its most useful features.
why version control
…because we’ve all been here before. Version control is worth the learning curve because:
There are a variety of version control systems available. We will be using Git, since it is by far the most commonly used system and it is free and open source.
interest in version control systems
You’ll need the following installed and ready to use prior to this course:
Once you’ve moved on from writing stand alone long R scripts to writing custom functions and iterating workflow, the next logical step is to start building your own custom package to more easily apply, document and share your code.
There are multiple ways to build a package, and it just keeps getting easier thanks to RStudio. The steps below are the ones that most consistently have worked for me as of January 2022.
Once those steps are completed, check that it worked by going to git tab to pull from GitHub. If the down and up arrows are grayed out, something went wrong. If they look like the image below, and you can pull down from GitHub, then you’re all set.
The easiest way to create a new package in RStudio is using the usethis package. You’ll first need to have a GitHub account and have RStudio connected to your GitHub account. Once that’s working, you can run the code below in your console.
usethis::create_package("D:/NETN/R_Dev/testpackage") # update to work with your file path
usethis::use_git() # sets up local git for new package
usethis::use_github() # creates new github repo called testpackage.
usethis::use_mit_license() # set license to MIT license (or use a different license.)
usethis::git_default_branch_rename() # renames master to main
The basic building blocks of an R package are defined in the list below. At a bare minimum, there are 2 files and 2 folders that make up an R package. The 2 required files are DESCRIPTION and NAMESPACE. The two folders are “R” and “man”. Several additional files that improve documentation and git workflow are also added by RStudio’s package template. Your Files pane should look something like this:
The last step before we get to start adding to our R package is to make sure the Build Tools are set up and functioning properly.
#' @title hello
#' @description test package for R training
#' @export
?testpackage::hello
If the text you added shows up in the help file, your Build tools are all set. If the Build exited with status 1, then something is wrong with the roxygen text in your hello.R file. Review the build results to see if you can find the line identified as failing. Also check that each of the lines you added are commented with both symbols ( #'), and that the terms following the @ are spelled correctly and don’t have a space.
The last thing to do is delete the hello.R file to clean up the package. You don’t need to delete the hello.Rd file, as it will be deleted the next time you rebuild your package.
Now we get to add to our package and make it useful! We’re going to add a simple function to the package that I use all the time for my workflow. It’s a function that takes the first 3 letters of a genus and species to create a species code. It saves me having to type out full species names when I’m filtering through a lot of data.
To follow along, go to File > New R Script (or key Ctrl + Shift + N) and copy the code below to the script.
#' @title make_sppcode
#' @description Make a 6-letter code with first 3 letters of genus and species
#'
#' @importFrom dplyr mutate select
#' @importFrom stringr word
#'
#' @param data Name of data frame that contains at least one column with Latin names
#' @param sppname Quoted name of the column that contains the Latin names
#'
#' @return Returns a data frame with a new column named sppcode.
#' @export
make_sppcode <- function(data, sppname){
data$genus = word(data[,sppname], 1)
data$species = ifelse(is.na(word(data[,sppname], 2)), "spp.", word(data[,sppname], 2))
data <- mutate(data, sppcode = toupper(paste0(substr(genus, 1, 3),
substr(species, 1, 3))))
data2 <- select(data, -genus, -species)
return(data2)
}
Note in the Roxygen code above, we added the title, description, and export like we did for hello.R. We added a few more arguments to the Roxygen2 text at the top, including imports, params, and return.
Now we also added 2 imports, which are dependencies of your R package. The first are mutate and select in the dplyr package. The second is the word function in the stringr package. By adding these 2 lines to the Roxygen, these two functions will become part of the Namespace of the package (more on that later), and will be usable by any function in your package.
If you use all base R functions within the functions of your package, you don’t need to use imports. In general, best coding practices are to minimize the number of dependencies to reduce the number of packages a user needs to install before using your package, and make it less likely that your package code will break because a dependency was updated. I use them here to show you the workflow when you need dependencies (e.g., it’s hard for me not to want dplyr at some point in a package). Another note is that @importFrom will only add the functions for that package in the Namespace, so you’re less likely to have conflicts with other packages. If you want to make the entire package available to the package Namespace (e.g., I’ve done this with ggplot2), then you’d write: #' @import ggplot2
Parameters are where you define the inputs to your function. If an input only takes certain arguments, like TRUE/FALSE, or a list of park codes, @param is how you document that to the user. Note that if your package functions share the same parameters, you can inherit parameters from other functions, instead of having to copy/paste them across functions by adding #' @inheritParams make_sppcode.
The @return argument tells the user what to expect as the output of the function.
@export argument tells R to export that function into the NAMESPACE file.
There’s one last piece of documentation we need to complete before dependencies will be installed when your package is installed. Open the DOCUMENTATION file. It should look like:
You’ll want to update the Title, Author, Maintainer, and Description, which are pretty self-explanatory. As you update your package, you’ll also want to update the Version number. Next we need to add the Imports and Suggests to the DESCRIPTION, which are defined below.
Imports: Packages listed under Imports will be installed at the same time your package is installed. You can also set the minimum version number that, if users don’t have, will be installed.
Suggests: These packages are not installed at the time your package is installed. Suggests are helpful for external packages that are only used by one or a few functions in your package. For example, one of our packages has a function that imports data directly from our SQL Server, but only a few network staff can access the server. The external packages that the SQL import function uses are listed under Suggests. The SQL import function then checks to see if the suggested packages are installed on the user’s computer. If not, it will stop and print an error that it needs to be installed. We’ll show that workflow later.
You can either manually add these to the DESCRIPTION file like:
OR, you can use the usethis package to do the heavy lifting!
usethis::use_package("dplyr") # for imports which is the default
usethis::use_package("stringr") # for imports which is the default
usethis::use_package("ggplot2", "Suggests") # for suggests
Note also that the License should be MIT + file LICENSE, if you followed the usethis workflow we showed earlier to create the package. I don’t know a lot about licenses, other than it’s best practice to set one. The MIT license is the most common passive license that means your code is open source and allows anyone to copy code with minimal restrictions. If you want all derivatives of your code to be open source, the GPLv3 license is the most common license (usethis::use_gpl_license()).
We’re finally ready to document the package (note you could have done it after each step). Go to the Build tab and click “Install and Restart” (or Ctrl + Shift + B). Assuming the roxygen and DESCRIPTION were written correctly, you should now see a make_sppcode.Rd in the man folder. You can also check that help works for the function:
?testpackage::make_sppcode
Open your NAMESPACE file. It should look like this:
The Namespace should contains all of the functions you’ve built for your package as exports, along with all of the external dependencies you are using within your functions as imports. As you add more functions and dependencies, they are added here each time you rebuild your package. You can also store data in the namespace, which can then be accessed by your package functions.
The concept of Namespace is a special beast, and can be a bit hard to wrap your head around. In a nutshell each package has its own environment that contains all the package’s functions, dependencies and objects (e.g., data) that have been defined for that package. This environment is separate from your global environment. When you load a package in your session, the package’s environment is accessible, but only through its functions. For example, dplyr is a dependency of our testpackage, When we load testpackage (e.g., library(testpackage)), the testpackage’s functions can use dplyr. However, if we need dplyr outside of testpackage functions, we have to load it first.
Now that the documentation is all set, let’s test that the make_sppcode() function actually works! Try running the code below to see if it works.
library(testpackage)
example_dat <- data.frame(Latin_Name = c("Carex limosa", "Arethusa bulbosa",
"Malaxis unifolia", "Calopogon tuberosus"),
cover = c(10, 40, 10, 50),
stems = c(50, 20, 10, 10))
example_dat2 <- make_sppcode(example_dat, "Latin_Name")
head(example_dat2)
Latin_Name cover stems sppcode
1 Carex limosa 10 50 CARLIM
2 Arethusa bulbosa 40 20 AREBUL
3 Malaxis unifolia 10 10 MALUNI
4 Calopogon tuberosus 50 10 CALTUB
While examples are not required, they are by far the best way to help users understand how to use your functions. They’re also breadcrumbs for future you as a reminder of how it works. Examples work best when you first create a simple fake data set to run with the function. That way, a user can easily reproduce and run the code on their machine. We just created the example we’re going to add in the process of testing the function. The code chunk below shows how to add it. Note that if you want to show an example that takes a long time to run, so don’t want it to run while building or checking the package, you can add \dontrun{ example code here }.
#' @title make_sppcode
#' @description Make a 6-letter code with first 3 letters of genus and species
#'
#' @importFrom dplyr mutate select
#' @importFrom stringr word
#'
#' @param data Name of data frame that contains at least one column with Latin names
#' @param sppname Quoted name of the column that contains the Latin names
#'
#' @return Returns a data frame with a new column named sppcode.
#'
#' @examples
#' library(testpackage)
#'
#' example_dat <- data.frame(Latin_Name = c("Carex limosa", "Arethusa bulbosa",
#' "Malaxis unifolia", "Calopogon tuberosus"),
#' cover = c(10, 40, 10, 50),
#' stems = c(50, 20, 10, 10)))
#'
#' example_dat2 <- make_sppcode(example_dat, "Latin_Name")
#' head(example_dat2)
#'
#' @export
make_sppcode <- function(data, sppname){
data$genus = word(data[,sppname], 1)
data$species = ifelse(is.na(word(data[,sppname], 2)), "spp.", word(data[,sppname], 2))
data <- mutate(data, sppcode = toupper(paste0(substr(genus, 1, 3),
substr(species, 1, 3))))
data2 <- select(data, -genus, -species)
return(data2)
}
The last thing you need to do before posting your package to GitHub for others to use is to run the R CMD check. You can do this 3 ways. You can either click on the Check ( ) in the Build tab, press Ctrl + Shift + E, or run devtools::check().
Thoughtful and thorough error handling make your package user friendly. Coding best practices are to have as many checks at the beginning of your function as possible to catch common (or even uncommon) issues that are likely to happen, and to have a clear error or warning message for each check. This is often referred to as “Fail early”. That way, the user won’t be waiting for code to run, only for it to fail a few minutes later with a vague or misleading error message. If you don’t have error handling in your function, the error is often the first external function that failed to run and that has built-in error handling, rather than an error message at the line of code where the actual function failed.
We’re going to add one more function to our testpackage, so we can talk about suggests (i.e. suggested packages that aren’t automatically installed when your package is installed) and error handling. The function will be called theme_IMD and will specify a custom theme for ggplot2, which is one of our suggests. Open a new script, name it theme_IMD, and copy the code chunk into it.
Notice the line that checks whether ggplot2 is installed on the user’s machine. If ggplot2 isn’t installed on the user’s machine, the function will fail immediately, and will print “Package ‘ggplot2’ needed for this function to work. Please install it.” in the console. This check only happens for functions that have this code in it. Note that for suggests, you also have to use the package:: approach to specify its functions. This is always a messy business for ggplot2…
There are tons of possible checks you can do. I often peek under the hood of packages that are well-designed to see what types of checks the pros actually use. You can view the code under the hood of a function by pressing the F2 key and clicking on the function.
Some examples of checks I commonly use are match.arg(), which makes sure arguments match between what the function allows and what the user specified. The stopifnot(class(argument) == 'classtype') is helpful to ensure numbers or logical arguments are specified properly. Other checks I often include are making sure the data sets that the function uses exist in the global environment. The code below is an excerpt from a function in our forestNETN package that compiles tree data. The first part of the code is checking the arguments specified by the user. The tryCatch() is looking for COMN_TreesByEvent object. If it exists, it will name it tree_vw. If it doesn’t it will exit the function and print the error quoted in the stop() into the console.
joinTreeData <- function(park = 'all', from = 2006, to = 2021, QAQC = FALSE, locType = c('VS', 'all'), panels = 1:4,
status = c('all', 'active', 'live', 'dead'),
speciesType = c('all', 'native','exotic', 'invasive'),
canopyPosition = c("all", "canopy"), dist_m = NA,
eventType = c('complete', 'all'), output = 'short', ...){
# Match args and classes
status <- match.arg(status)
park <- match.arg(park, several.ok = TRUE,
c("all", "ACAD", "MABI", "MIMA", "MORR", "ROVA", "SAGA", "SARA", "WEFA"))
stopifnot(class(from) == "numeric", from >= 2006)
stopifnot(class(to) == "numeric", to >= 2006)
locType <- match.arg(locType)
stopifnot(class(QAQC) == 'logical')
stopifnot(panels %in% c(1, 2, 3, 4))
output <- match.arg(output, c("short", "verbose"))
canopyPosition <- match.arg(canopyPosition)
speciesType <- match.arg(speciesType)
# Check for tree data in global environment
tryCatch(tree_vw <- COMN_TreesByEvent,
error = function(e){stop("COMN_TreesByEvent view not found. Please import view.")}
)
}
Error handling could take an entire day or more to cover fully, but that’s about all we have time for today. For more detail, Chapter 8 Conditions in the Advanced R book covers this topic quite thoroughly. Another useful resource is Chapter 2.5 Error Handling and Generation in the Mastering Software Development in R.
Debugging is another big topic that we only have time to scratch the surface. For further reading, the best resource I’ve found on debugging is Chapter 22 Debugging in the Advanced R book.
The simplest form of debugging is to load the dependencies and define objects in your global environment that will feed into the function, and then to run the code in the function under the hood. A simple example with the make_sppcode function is in the code chunk below. Note that I commented out the lines that start and end the function.
# dependencies
library(stringr)
library(dplyr)
#function args
data <- example_dat
sppname <- "Latin_Name"
#make_sppcode <- function(data, sppname){
data$genus = word(data[,sppname], 1)
data$species = ifelse(is.na(word(data[,sppname], 2)), "spp.", word(data[,sppname], 2))
data <- mutate(data, sppcode = toupper(paste0(substr(genus, 1, 3),
substr(species, 1, 3))))
data2 <- select(dat, -genus, -species)
## Error in select(dat, -genus, -species): object 'dat' not found
# return(data2)
#}
In the example above, we found that the line with select(dat, -genus, -species) had a typo: dat should have been data.
traceback()
There are several other built-in R functions that can help with debugging. The two I use the most often are traceback() and debug(). To show how traceback() works, let’s create a function that we know has an error. Copy this code to your R session and run it. You should see make_sppcode_error show up in your global environment after you run it.
make_sppcode_error <- function(data, sppname){
data$genus = word(data[,sppname], 1)
data$species = ifelse(is.na(word(data[,sppname], 2)), "spp.", word(data[,sppname], 2))
data <- mutate(data, sppcode = toupper(paste0(substr(genus, 1, 3),
substr(species, 1, 3))))
data2 <- select(dat, -genus, -species)
return(data2)
}
Now try to use the function:
make_sppcode_error(example_dat, sppname = "Latin_Name")
## Error in select(dat, -genus, -species): object 'dat' not found
It should fail, and the error message tells you that object ‘dat’ not found. You could then go look under the hood in your function to try to find where dat lived. Or, you can use traceback(), which shows you the code and line number that failed. If you have functions from your package that this function uses, and the other function is what failed, traceback() will tell you that too. Run the code below to see for yourself.
traceback()
debug()
The debug() function allows you to look under the hood of a function and steps through the function code one line at a time. You can see the outputs of each line, and even interact with/change them to test how the function behaves. Once you get the hang of it, you’ll never go back to the low-tech debugging approach I described first.
The code chunk below shows how to start using the debug() function to step through the make_sppcode_error() function. It’s hard to show with R Markdown, but we’ll demo how to walk through the browser in debug() in a minute. Once you run the code below, your console will show a message that starts with “debugging in: make_sppcode_error(data,”Latin_Name”). You’ll also see a Browse[2]> below, where you can enter one of several options:
debug(make_sppcode_error)
make_sppcode_error(data, "Latin_Name")
In our case, we’ll enter n and step our way through the function, printing the head(data) to make sure it looks the way we expect. Eventually we’ll find that the function fails on the select(dat, ...) line. Then we’ll exit out by pressing Q or c.
R Packages book: The 2nd Edition is currently under development, with lots of updates to package development workflow (e.g. the usethis package!) and improved examples being added frequently. You can’t go wrong when Hadley Wickham and Jenny Bryan team up on a project.
Advanced R is an excellent resource to learn the more advanced skills and concepts with R programming. The book is pretty advanced and assumes you already have a solid foundation in R programming.
Mastering Software Development in R covers a lot of the same topics as Advanced R that we cover in this training, but is a gentler introduction and doesn’t get quite as advanced.
This tab prints all of the code chunks in this file in one long file.
#--------------------
# Prep
#--------------------
install.packages('devtools')
library(devtools)
library(roxygen2)
library(usethis)
#run these in the console without the () to see what lies underneath
mean
lm
set.seed(12345) #gives everybody the same data
d<-c(floor(runif(100)*100),NA) #generate random data
mean(x=d) #unexpected result
mean2<- #Tell [R] that I want this new function to be named "mean2"
function(x){ #the function consists of 1 parameter named x (aka the data) The { begins the function source code / expressions.
mean(x,na.rm=T) #in the mean function change the default for na.rm=T
} #close function
mean2(x=d) #more expected result
mean2(x=d, na.rm=F)
mean3<- function(x,na.rm=T){mean(x=x, na.rm=na.rm)}
mean3(d)
mean4<- function(x,na.rm){#very minor change. I deleted the initial parameter value
mean(x=x, na.rm=na.rm)}
mean4(d)
mean5<- function(x,na.rm){mean(x=x, na.rm=na.rm)} #always works
mean5<- function(x,na.rm) mean(x=x, na.rm=na.rm) #only works on one line
library(ggplot2);library(magrittr)
#get that data
fNames<-c("APIS01_20548905_2021_temp.csv",
"APIS02_20549198_2021_temp.csv",
"APIS03_20557246_2021_temp.csv",
"APIS04_20597702_2021_temp.csv",
"APIS05_20597703_2021_temp.csv")
fPaths<-paste0("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_Intro/master/Data/", fNames)
HoboList<-lapply(fPaths, FUN=read.csv, skip=1, header=T)%>% #1. read hobo data into a list
lapply(., "[",,1:4)%>% #2. Grab only first 3 columns. Empty comma is not an error
lapply(., setNames, c("idx","DateTime","T_F","Lum"))%>% #3. set col names
lapply(., dplyr::mutate, DateTime2=as.POSIXct(DateTime, "%m/%d/%y %H:%M:%S", tz="UCT"))%>%#4. format datetime in new variable.
setNames(., fNames) #5. name each one for tracking
hobo_summary <- function(x, col=3) {
funs <- c(mean, median, sd, mad, IQR) #list of functions
unlist( #unlist simplifies somewhat ugly output
y<-lapply(funs, function(f){f(x[,col], na.rm = TRUE)})%>%
setNames(.,c("mean", "median", "sd", "mad", "IQR"))
)
} #credit to [5] for this general example idea
summarized_data<-lapply(HoboList, FUN=hobo_summary)%>%
do.call("rbind", . )%>%
as.data.frame( . )
ggplotCustom<-function(i, j, pattern=".csv", replacement="_plot.pdf", path=choose.dir(), device="pdf", height=5, width=5,units="in"){
p<-ggplot(data = j[[i]], aes(x=DateTime2, y=T_F))+
geom_point()+
ggtitle(names(j)[i])
ggsave(filename=gsub(pattern=pattern, replacement = replacement, names(j)[i]),
path=path, plot=p, device=device,
height=height, width=width, units=units)
}
#do some quick testing
ggplotCustom(i=1, j=HoboList, path= "C:/Users/tparr/Downloads/Training_Output/") #test to see if function is working for a positive case
ggplotCustom(i=3, j=HoboList, path= "C:/Users/tparr/Downloads/Training_Output/") #test to see how it behaves on a negative case
#now iterate
lapply(seq_along(HoboList), FUN=ggplotCustom, HoboList, path="C:/Users/tparr/Downloads/Training_Output/")
lm_ls<-function(data,x){ifmod<-lm(x, data=data); return(mod)}
modlist<-lapply(HoboList[c(2,4,5)], lm_ls, T_F~Lum) #I know some are missing light data
lapply(modlist, summary)
lapply(modlist, plot)
lapply(modlist, coef)
lm_stats<-function(mod){ mod_sum<-summary(mod) #may or may not be worthwhile
out<-data.frame(
intercept= coef(mod)[[1]],
slope= coef(mod)[[2]],
slp_pval=mod_sum$coefficients[,4][[2]], #see what happens if you run this without the [[2]]
R2_adj= mod_sum$adj.r.squared,
mod_pval= mod_sum$fstatistic %>% {unname(pf(.[1],.[2],.[3],lower.tail=F))})
return(out)
}
m<-lapply(modlist, lm_stats)%>%
do.call(rbind,.)%>%
dplyr::mutate(.,id=rownames(.))%>%
magrittr::set_rownames(.,1:nrow(.))
HoboList2<-c(rep(HoboList,5)) #make the dataset larger
plan("multisession", workers=parallel::detectCores()-1) #initiate a multicore session, the number of cores to use to 1 fewer than the max detected. Reduces chance of overwhelming the system.
microbenchmark::microbenchmark(
"sequential"={lapply(seq_along(HoboList2), FUN=ggplotCustom, HoboList2, path="C:/Users/tparr/Downloads/Training_Output/")},
"parallel"={future_lapply(seq_along(HoboList2), FUN=ggplotCustom, HoboList2, path="C:/Users/tparr/Downloads/Training_Output/")},
times=5,
unit="s"
)
plan("sequential") #close the multicore session.
install.packages("rmarkdown")
#------------------
# R Markdown I
#------------------
knitr::include_graphics("./images/YAML.png")
# First-level header
## Second-level header
...
###### Sixth-level header
*italic* or _italic_
**bold** or __bold__
superscript^2^
endash: --
Example sentence: *Picea rubens* is the dominant species in **Acadia National Park**.
<h1>First-level header</h1>
<h2>Second-level header</h2>
...
<h6>Sixth-level header</h6>
<i>italic</i>
<b>bold</b>
superscript<sup>2</sup>
endash: –
Example sentence: <i>Picea rubens</i> is the dominant species in <b>Acadia National Park</b>.
knitr::opts_chunk$set(results = 'asis')
library(tidyverse)
wetdat <- read.csv(
"https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_Advanced/main/data/ACAD_wetland_data_clean.csv")
wetsum <- wetdat %>% group_by(Site_Name, Year) %>%
summarize(num_inv = sum(Invasive), num_prot = sum(Protected),
.groups = 'drop')
devtools::install_github("haozhu233/kableExtra")
library(kableExtra) # for extra kable features
library(knitr) # for kable
wet_kable <- kable(wetsum, format = 'html') %>% # if using pdf, need LaTeX
kable_styling(full_width = FALSE, bootstrap_options = 'striped') #kableExtra function
wet_kable
wet_kable
# custom kable function that requires data, column names and caption
make_kable <- function(data, colnames = NA, caption = NA){
kab <- kable(data, format = 'html', col.names = colnames, align = 'c', caption = caption) %>%
kable_styling(fixed_thead = TRUE,
bootstrap_options = c('condensed', 'bordered', 'striped'),
full_width = FALSE,
position = 'left',
font_size = 12) %>%
row_spec(0, extra_css = "border-top: 1px solid #000000; border-bottom: 1px solid #000000;") %>%
row_spec(nrow(data), extra_css = 'border-bottom: 1px solid #000000;')
}
# use function with wetsum data
wetkab2 <- make_kable(wetsum,
colnames = c("Site", "Year", "# Invasive", "# Protected"),
caption = "Table 1. Summary of wetland data") %>%
scroll_box(height = "250px")
wetkab2
wetkab3 <- make_kable(wetsum,
colnames = c("Site", "Year", "# Invasive", "# Protected"),
caption = "Table 1. Summary of wetland data") %>%
row_spec(0, extra_css = "border-top: 1px solid #000000; border-bottom: 1px solid #000000;") %>%
column_spec(3, background = ifelse(wetsum$num_inv > 0, "orange", FALSE)) %>%
collapse_rows(1, valign = 'top')
wetkab3
library(DT)
wetdt <- datatable(wetsum, colnames = c("Site", "Year", "# Invasive", "# Protected"))
wetdt
wetdt
# modify pageLength and lengthMenu
wetdt2 <- datatable(wetsum, colnames = c("Site", "Year", "# Invasive", "# Protected"),
width = "40%",
options = list(pageLength = 10,
lengthMenu = c(5, 10, 20))
)
wetdt2
wetdt3 <- datatable(data.frame(wetsum, "Notes" = NA),
width = "40%",
colnames = c("Site", "Year", "# Invasive", "# Protected", "Notes"),
options = list(pageLength = 10),
class = 'cell-border stripe',
filter = list(position = c('top'), clear = FALSE),
editable = list(target = 'cell', disable = list(columns = 1:4))) %>%
formatStyle(3, backgroundColor = styleInterval(0, c('white', "orange")))
wetdt3
wetdt3
{width=400px}
<img src="./images/map_of_parks.jpg" alt = "Map of Region-1 IMD parks" width="400px">
photos <- list.files("./images/photopoints", full.names = TRUE)
include_graphics(photos)
knitr::opts_chunk$set(fig.height=4, fig.width=6, fig.align='left')
output:
bookdown::html_document2:
numbered_sections: false
number_sections: false
fig_caption: true
library(dplyr)
library(ggplot2)
devtools::source_url("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_Advanced/main/Generate_fake_invasive_data_and_plots.R")
invplot_all
invplot_ACAD
As you can see in Figure \@ref(fig:fig-inv-acad), invasive cover appears to be declining. Whereas, invasive cover appears more stable in other NETN parks (Figure \@ref(fig:fig-inv-all)).
knitr::opts_chunk$set(echo = TRUE)
#------------------
# R Packages I
#------------------
knitr::include_graphics("./images/gitshell_init_code.png")
usethis::create_package("D:/NETN/R_Dev/testpackage") # update to work with your file path
usethis::use_git() # sets up local git for new package
usethis::use_github() # creates new github repo called testpackage.
usethis::use_mit_license() # set license to MIT license (or use a different license.)
usethis::git_default_branch_rename() # renames master to main
#------------------
# R Packages 2
#------------------
knitr::include_graphics("./images/Project_Options_Build_Tools.png")
#' @title hello
#' @description test package for R training
#' @export
knitr::include_graphics("./images/Build_hello_example.png")
?testpackage::hello
#' @title make_sppcode
#' @description Make a 6-letter code with first 3 letters of genus and species
#'
#' @importFrom dplyr mutate select
#' @importFrom stringr word
#'
#' @param data Name of data frame that contains at least one column with Latin names
#' @param sppname Quoted name of the column that contains the Latin names
#'
#' @return Returns a data frame with a new column named sppcode.
#' @export
make_sppcode <- function(data, sppname){
data$genus = word(data[,sppname], 1)
data$species = ifelse(is.na(word(data[,sppname], 2)), "spp.", word(data[,sppname], 2))
data <- mutate(data, sppcode = toupper(paste0(substr(genus, 1, 3),
substr(species, 1, 3))))
data2 <- select(data, -genus, -species)
return(data2)
}
usethis::use_package("dplyr") # for imports which is the default
usethis::use_package("stringr") # for imports which is the default
usethis::use_package("ggplot2", "Suggests") # for suggests
?testpackage::make_sppcode
library(testpackage)
example_dat <- data.frame(Latin_Name = c("Carex limosa", "Arethusa bulbosa",
"Malaxis unifolia", "Calopogon tuberosus"),
cover = c(10, 40, 10, 50),
stems = c(50, 20, 10, 10))
example_dat2 <- make_sppcode(example_dat, "Latin_Name")
head(example_dat2)
#' @title make_sppcode
#' @description Make a 6-letter code with first 3 letters of genus and species
#'
#' @importFrom dplyr mutate select
#' @importFrom stringr word
#'
#' @param data Name of data frame that contains at least one column with Latin names
#' @param sppname Quoted name of the column that contains the Latin names
#'
#' @return Returns a data frame with a new column named sppcode.
#'
#' @examples
#' library(testpackage)
#'
#' example_dat <- data.frame(Latin_Name = c("Carex limosa", "Arethusa bulbosa",
#' "Malaxis unifolia", "Calopogon tuberosus"),
#' cover = c(10, 40, 10, 50),
#' stems = c(50, 20, 10, 10)))
#'
#' example_dat2 <- make_sppcode(example_dat, "Latin_Name")
#' head(example_dat2)
#'
#' @export
make_sppcode <- function(data, sppname){
data$genus = word(data[,sppname], 1)
data$species = ifelse(is.na(word(data[,sppname], 2)), "spp.", word(data[,sppname], 2))
data <- mutate(data, sppcode = toupper(paste0(substr(genus, 1, 3),
substr(species, 1, 3))))
data2 <- select(data, -genus, -species)
return(data2)
}
#------------------
# R Packages III
#------------------
library(stringr)
library(dplyr)
example_dat <- data.frame(Latin_Name = c("Carex limosa", "Arethusa bulbosa",
"Malaxis unifolia", "Calopogon tuberosus"),
cover = c(10, 40, 10, 50),
stems = c(50, 20, 10, 10))
#' @title theme_IMD: custom ggplot2 theme for forestNETN
#'
#'
#' @description This is a custom ggplot2 theme that removes the default panel grids
#' from ggplot2 figures, and makes the axes and tick marks grey instead of black.
#'
#' @return This function must be used in conjunction with a ggplot object, and will return a ggplot object with the custom theme.
#'
#' @examples
#' example_dat <- data.frame(Latin_Name = c("Carex limosa", "Arethusa bulbosa",
#' "Malaxis unifolia", "Calopogon tuberosus"),
#' cover = c(10, 40, 10, 50),
#' stems = c(50, 20, 10, 10))
#' library(ggplot2)
#' p <- ggplot(data = example_dat, aes(x = cover, y = stems)) +
#' geom_point() +
#' theme_IMD()
#' p
#'
#' @export
theme_IMD <- function(){
# Check that suggested package required for this function is installed
if(!requireNamespace("ggplot2", quietly = TRUE)){
stop("Package 'ggplot2' needed for this function to work. Please install it.", call. = FALSE)
}
ggplot2::theme(panel.grid.major = ggplot2::element_blank(),
panel.grid.minor = ggplot2::element_blank(),
panel.background = ggplot2::element_rect(color = '#696969', fill = 'white', size = 0.4),
plot.background = ggplot2::element_blank(),
strip.background = ggplot2::element_rect(color = '#696969', fill = 'grey90', size = 0.4),
legend.key = ggplot2::element_blank(),
axis.line.x = ggplot2::element_line(color = "#696969", size = 0.4),
axis.line.y = ggplot2::element_line(color = "#696969", size = 0.4),
axis.ticks = ggplot2::element_line(color = "#696969", size = 0.4)
)}
joinTreeData <- function(park = 'all', from = 2006, to = 2021, QAQC = FALSE, locType = c('VS', 'all'), panels = 1:4,
status = c('all', 'active', 'live', 'dead'),
speciesType = c('all', 'native','exotic', 'invasive'),
canopyPosition = c("all", "canopy"), dist_m = NA,
eventType = c('complete', 'all'), output = 'short', ...){
# Match args and classes
status <- match.arg(status)
park <- match.arg(park, several.ok = TRUE,
c("all", "ACAD", "MABI", "MIMA", "MORR", "ROVA", "SAGA", "SARA", "WEFA"))
stopifnot(class(from) == "numeric", from >= 2006)
stopifnot(class(to) == "numeric", to >= 2006)
locType <- match.arg(locType)
stopifnot(class(QAQC) == 'logical')
stopifnot(panels %in% c(1, 2, 3, 4))
output <- match.arg(output, c("short", "verbose"))
canopyPosition <- match.arg(canopyPosition)
speciesType <- match.arg(speciesType)
# Check for tree data in global environment
tryCatch(tree_vw <- COMN_TreesByEvent,
error = function(e){stop("COMN_TreesByEvent view not found. Please import view.")}
)
}
# dependencies
library(stringr)
library(dplyr)
#function args
data <- example_dat
sppname <- "Latin_Name"
#make_sppcode <- function(data, sppname){
data$genus = word(data[,sppname], 1)
data$species = ifelse(is.na(word(data[,sppname], 2)), "spp.", word(data[,sppname], 2))
data <- mutate(data, sppcode = toupper(paste0(substr(genus, 1, 3),
substr(species, 1, 3))))
data2 <- select(dat, -genus, -species)
# return(data2)
#}
make_sppcode_error <- function(data, sppname){
data$genus = word(data[,sppname], 1)
data$species = ifelse(is.na(word(data[,sppname], 2)), "spp.", word(data[,sppname], 2))
data <- mutate(data, sppcode = toupper(paste0(substr(genus, 1, 3),
substr(species, 1, 3))))
data2 <- select(dat, -genus, -species)
return(data2)
}
make_sppcode_error(example_dat, sppname = "Latin_Name")
traceback()
debug(make_sppcode_error)
make_sppcode_error(data, "Latin_Name")
knitr::include_graphics("./images/package_commit.jpg")
devtools::install_github("KateMMiller/testpackage")
We are the people who designed and led the training for this week. We hope you found it useful, and that you keep with it!
Andrew Birch
WRD/IMD Water Quality Program Lead
andrew_birch@nps.gov
Ellen Cheng
SER Quantitative Ecologist
ellen_cheng@nps.gov
Kate Miller
NETN/MIDN Quantitative Ecologist
kathryn_miller@nps.gov
Lauren Pandori
CABR Marine Biologist - MEDN
lauren_pandori@nps.gov
Thomas Parr
GLKN Program Manager
thomas_parr@nps.gov
John Paul Schmit
NCRN Quantitative Ecologist
john_paul_schmit@nps.gov
Sarah Wright
MOJN Data Scientist
sarah_wright@nps.gov